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Abstract 

Implementors of compilers, program refactorers, theorem provers, 
proof checkers, and other systems that manipulate syntax know that 
dealing with name binding is difficult to do well. Operations such as 
a-equivalence and capture-avoiding substitution seem simple, yet 
subtle bugs often go undetected. Furthermore, their implementa- 
tions are tedious, requiring "boilerplate" code that must be updated 
whenever the object language definition changes. 

Many researchers have therefore sought to specify binding syn- 
tax declaratively, so that tools can correctly handle the details be- 
hind the scenes. This idea has been the inspiration for many new 
systems (such as Beluga, Delphin, FreshML, FreshOCaml, Caml, 
FreshLib, and Ott) but there is still room for improvement in ex- 
pressivity, simplicity and convenience. 

In this paper, we present a new domain-specific language, UN- 
BOUND, for specifying binding structure. Our language is particu- 
larly expressive — it supports multiple atom types, pattern binders, 
type annotations, recursive binders, and nested binding (necessary 
for telescopes, a feature found in dependently-typed languages). 
However, our specification language is also simple, consisting of 
just five basic combinators. We provide a formal semantics for this 
language derived from a locally nameless representation and prove 
that it satisfies a number of desirable properties. 

We also present an implementation of our binding specification 
language as a GHC Haskell library implementing an embedded do- 
main specific language (EDSL). By using Haskell type constructors 
to represent binding combinators, we implement the EDSL suc- 
cinctly using datatype-generic programming. Our implementation 
supports a number of features necessary for practical programming, 
including flexibility in the treatment of user-defined types, best- 
effort name preservation (for error messages), and integration with 
Haskell's monad transformer library. 

Categories and Subject Descriptors D.2.3 [Coding Tools and 
Techniques] ; D. 1 . 1 [Applicative ( Functional) Programming] ; E. 1 
[Data Structures] 

General Terms Algorithms, Languages. 

Keywords generic programming, Haskell, name binding, patterns 



1. Introduction 

Name binding is one of the most annoying parts of language imple- 
mentations. Although functional programming languages such as 
Haskell and ML excel at the implementation of type checkers, com- 
pilers, and interpreters, there is an impedance mismatch between 
the free structure provided by algebraic datatypes and the syntax 
identified up to a-equivalence that we actually want to model. Al- 
though there are many techniques for implementing name binding, 
they require subtle invariants that pervade the system. Implementa- 
tion flaws cause bugs that can be quite difficult to track down. Fur- 
thermore, the implementations themselves are tedious, requiring 
boilerplate code that must be maintained as the implemented lan- 
guage evolves. While such boilerplate is straightforward, it causes 
friction for developers who just want to get the job done. 

And all this work is for something so "obvious" that it is often 
elided from language definitions! 

There has been much research towards solving this problem. 
Recently introduced languages and tools provide primitive sup- 
port for variable-binding, based on first-order ([6, 19, 28, 29]) and 
higher-order representations ([16, 18]). These tools handle the de- 
tails behind the scenes, relieving programmers of the tedium and 
subtle bugs described above. However, these tools must also satisfy 
the practical needs of programmers, and here they fall short: 

Expressiveness: These tools provide binding specification lan- 
guages that specify what variables are bound where. As unary lex- 
ical scoping (binding a single variable in a single location) is not 
sufficient for many applications, many of these tools support a lan- 
guage for patterns in binding specifications. Despite this flexibility, 
it is still not enough — there are patterns that we would like to use 
that cannot be defined by existing specification languages. 

Availability: Programmers want to write code in their language, 
and they want to do it directly. Tools that are wrappers for existing 
languages are preferred to completely new systems, but such tools 
still require update with each new version of the language. Libraries 
are more stable and have the added benefit of simple distribution, 
some degree of portability and familiar syntax. 

Choice of implementations: These tools each provide only 
one implementation for any given binding specification. How- 
ever, name binding involves a number of different operations and 
some implementations may favor one over the other. Programmers 
should be able to swap out implementations (or write their own) if 
they find one that works better with their application. 

In this paper, we present a new domain-specific language, UN- 
BOUND, for specifying binding structure that addresses these is- 
sues. Concretely, our contributions are as follows: 
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We describe a small, compositional set of abstract combinators 
which form the entire basis for UNBOUND. This interface suc- 
cinctly characterizes our specification language. 

We show, via examples (§ 2), that UNBOUND is nonetheless ex- 
pressive. In particular, it supports multiple atom types, pattern 
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binders, type annotations, recursive binders, and nested binders. 
The last are necessary to model telescopes and are not sup- 
ported by any existing specification language. 

• We give a formal semantics for our specification language (§ 4) 
based on a locally nameless representation and prove its cor- 
rectness (§ 5). Our choice of representation leads to a simple 
semantics and straightforward metatheory. Alternative mean- 
ings are also possible; the simplicity of ours makes it a good 
reference for more sophisticated implementations. 

• We have implemented our framework as a Haskell library (§ 6), 
using Haskell's generic programming support to automatically 
derive standard operations. Our library is available for down- 
load from Hackage, 1 along with extensive documentation and 
examples. (Note GHC 7 is required.) 

2. The Unbound Specification Language 

We begin with a simple UNBOUND specification. 2 Functional pro- 
grammers are accustomed to using algebraic datatypes to specify 
the abstract syntax of a programming language. UNBOUND intro- 
duces type combinators that encode binding structure into the alge- 
braic datatype itself. For example, to represent the untyped lambda 
calculus, we use the E datatype below: 



UNBOUND operations used in this example: 



type TV 
data E 



Name E 

Var N 

Lam (Bind N E) 

App E E 



The new abstract type Name represents variables, and is in- 
dexed by the type of values which can be substituted for them (here, 
E). For convenience, we define TV as a synonym for Name E. 
Lambda abstractions are represented using the type Bind N E, 
indicating a name paired with an expression in which the name is 
bound. Application does not involve binding, so it is simply a pair 
of E values as expected. 

UNBOUND uses this datatype definition to derive standard oper- 
ations for working with syntax, such as a-equivalence, free variable 
calculation, and capture-avoiding substitution. For example, sup- 
pose we want to implement parallel reduction for untyped lambda 
calculus terms. This operation looks throughout a term for /3- and 
^-reductions, even under lambda abstractions, transforming it into 
a simpler form. An implementation is shown in Figure 1 . The sig- 
natures for the UNBOUND-derived operations that this code relies 
on are at the top of the figure. All of these functions are automati- 
cally derived by UNBOUND. 

The function red has three cases. The Var case is trivial. The 
Lam case must handle the possibility of ^-reduction, so we must 
break the lambda into its two constituent parts — its bound variable, 
and its body. Note that the type Bind N E type is abstract, so we 
cannot use pattern matching to extract its components. Instead, the 
monadic unbind operation decomposes the binding, ensuring that 
the name x does not conflict with other names currently in scope. 

Once the body of the lambda expression has been reduced, the 
code checks to see if it can do an 77-reduction. This is possible when 
the body is exactly the application of some other term e" to the 
variable x, where x does not appear free in e". If an ^-reduction is 
not possible, the binding is reformed using the bind constructor for 
the Bind N E type. A similar unbinding occurs in the application 
case, when a /3-reduction has been detected, followed by an invoca- 
tion of a capture-avoiding substitution operation also provided by 
Unbound. 



bind 

unbind 

fv 

subst 



N -> E -> (Bind TV E) 

Fresh m => (Bind N E) -> m (N, E) 

E^ Set N 

N -> E -> E 



Parallel Reduction: 

red :: Fresh m => E — > m E 
red ( Var x) — return ( Var x) 
red (Lam b) = do 

(x, e) «— unbind b 

e' <s— red e 

case e' of — apply the eta-rule: (A x.e x) = e 

App e" ( Var y) | x = y A -1 (x € fv e") — > return e 1, 
_ — > return (Lam (bind x e')) 

red (App e\ e 2 ) = do 
e[ 4— red ei 

case e[ of — apply the beta rule: (A x.e) t = e[t/x] 
Lam b — > do 

(x, e') <— unbind b 
e 2 <— red e 2 
return (subst x e 2 e') 
_ — > do 

e 2 red e 2 
return (App e[ e 2 ) 

Figure 1. Parallel reduction for E 



T G T 
Name T 
R 

Bind P T 



Names for Ts 

Regular datatype containing only terms 
Bind pattern P in body T 



1 http : //hackage . haskell . or g/package /unbound/ 

2 While our examples are presented in Haskell, using our Haskell library, 
the examples themselves are language neutral. 



P e P 

Name T Single binding name 

R P Regular datatype containing only patterns 

Embed T Embedded term (§ 3.1) 

Rebind P P Nested binding pattern (§ 3.2) 

Rec P Recursive binding pattern (§ 3.3) 



Figure 2. UNBOUND type combinators 



At this point, one may ask: Where do these operations come 
from'! What do they mean? How do we know that the code given 
for red correctly implements parallel reduction for the lambda 
calculus! The first question we answer in § 6 where we discuss 
the Haskell implementation of UNBOUND. The other questions 
motivate our semantics (§ 4) and the theorems we choose to prove 
about it (§ 5). 



3. Beyond single binding 

A key feature of UNBOUND is that programmers are not limited to 
binding a single variable at a time. Instead, the bind constructor 
takes a pattern of variables, and abstracts over all of them. A large 
class of types may be used as patterns. As a simple example, lists of 
names are patterns. If we wanted to allow the syntax Xx y z — ¥ . . . 
as a convenient shorthand for Ax — > Ay — > \z — > where x, y, 
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and z are distinct names, 3 we could change our definition of E to 
the following: 

data E = Var N 

Lam (Bind [N] E) 
| App E E 

In general, UNBOUND uses two sorts of types: those that may be 
used as patterns, where names are binding occurrences, and those 
that are terms, where names are references to binding sites. Figure 2 
summarizes these two classes, written P and T respectively. The 
Bind type combinator takes a pattern type as its first argument and 
a term type as its second argument and returns a term type. Other 
term types include Name (representing free variables) and regular 
datatypes — those built using unit, base types, sums, products, and 
least fixpoint — that contain only term types. By convention, we use 
the metavariable P for pattern types and T for term types. 

The expressiveness of UNBOUND is determined by P, the col- 
lection of types that can be used as patterns. These types include 
Names, of course, as well as regular datatypes that contain only 
other pattern types. This mean that some types, such as Int and 
String, can be used as both terms and patterns. We describe the 
three remaining UNBOUND pattern combinators (Embed, Rebind 
and Rec) in more detail in the next subsections. 

As a more sophisticated example of pattern binding, consider 
adding pattern matching to the E language with a case statement. 
Each branch is encoded as Bind Pat E, where Pat is a new 
datatype representing object-language patterns. Every name occur- 
ring in a Pat will be bound in the respective body. 

data Pat = PVar N | PC on String [Pat] 

dataS = ... 

| Con String [E] — data constructors 

Case E [Bind Pat E] — pattern matching 

It is not hard to check that Pat is a valid pattern type (since 
it contains only Names and Strings), justifying its use as the first 
argument to Bind. 

3.1 Embedding terms in patterns 

In many situations it is convenient to be able to embed terms within 
patterns. Such embedded terms do not bind variables along with the 
rest of the pattern. For example, suppose we wanted to extend our E 
language with simple let-binding, let x=ei in e2. Here x is bound 
in e2 but not in ei. 

A semantically correct encoding puts the ei in the abstract 
syntax before binding x'mei, "lifting" ei outside the binding so it 
does not participate. 

type Ei — E 
type E-2 = E 
data£= ... 

Let Ei (Bind N E 2 ) 

(We use the type synonyms E\ and E2 to indicate which sub-terms 
of Let correspond to ei and e2.) However, this encoding forces 
us to write the terms in an unnatural order; moreover, it fails to 
express the relationship between the name being let-bound and 
its definition. We can craft a more satisfying solution using the 
embedding combinator Embed provided by UNBOUND: 

data£= ... 

Let (Bind (N, Embed E{) E 2 ) 

Embed may only occur within pattern types, where it serves as 
an "escape hatch" for embedding terms which do not bind any 

3 Although UNBOUND supports shadowing, a single pattern must be linear 
(i.e. not contain repeated variables). 



names. Note that a term type within an Embed may itself contain 
pattern types (inside the left-hand side of Bind) which may contain 
Embedded term types, and so on. 

This formulation with Embed also enables us to extend our let 
expressions to multiple bindings, by using a pattern list: 

data£= ... 

Let (Bind [(N, Embed E)] E) 

Without Embed, we would have to encode this binding specifica- 
tion by "unzipping" the list of name-definition pairs and lifting the 
definitions outside of the Bind: 

data E= ... 

I Let [E] (Bind [TV] E) 

But this example is even worse than the corresponding encoding 
of let with a single binding. Not only does it force us to use an 
unnatural order and fail to encode the relationship between names 
and their respective definitions, it also admits "junk" terms where 
the lists are different lengths. Embed makes possible an encoding 
where names and their definitions are paired, as they should be. 

3.2 Nested binding 

Consider now a let* construct, let* xi=ei, . . . , x„=e„ in e, where 
each Xi is bound in e and also in all the ej with j > i. One way 
to encode this pattern is by iterating the encoding for single let 
bindings discussed above: 

data LetList = Body E 

I Binding (Bind (TV, Embed E) LetList) 
data E = ... 

I LetStar LetList 

This succeeds in capturing the binding structure of let*, and 
may be sufficient for some purposes. However, it does have one 
major drawback: in order to extract the body of the let* expression, 
we must first recurse through all the bindings. It is more convenient 
to encode a let* expression by pairing a list of bindings and a body, 
so the body can be accessed without first processing the bindings. 
As a first try, we could write 

data E = ... 

LetStar (Bind [(N, Embed E)] E) 

but this is just the multiple binding example from the previous 
subsection. With this specification, the Xi are bound only in the 
body of the let*, not in the definitions of subsequent variables. We 
evidently want a way to nest additional binding structure within the 
pattern of the outermost Bind. 

UNBOUND provides a novel rebinding pattern combinator for 
precisely this purpose. Rebind Pi P2 acts like the pattern type 
(Pi, P2), except that Pi also scopes over P 2 , so the binders in Pi 
may be referred to by terms embedded within P2 . (The fact that Pi 
scopes over P2 in this way has no effect on the pattern portion of 
P 2 .) For example, consider the specification 

type N\ = Name E 
type N2 = Name E 
Bind (Rebind N x (N 2 , Embed £1)) E 2 

Here Ni and 7V" 2 are bound in E2, and additionally N± is also bound 
within Ei . 

Using rebinding, we can faithfully encode the binding structure 
of let* as follows: 

data Lets = Nil 

Cons (Rebind (N , Embed E) Lets) 
data E = ... 

LetStar (Bind Lets E) 
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All the names within the sequence of definitions are bound in the 
body of the let*-expression (Bind Lets E); additionally, each 
name (paired with its definition as an embedded term) is bound 
within any Embeds occurring in the remainder of the sequence — 
that is, within the definitions of subsequent names. 

Telescopes A particularly important example of a binding pattern 
that requires Rebind is a telescope. Telescopes were invented by de 
Bruijn [8] to model dependently-type systems. They are used fre- 
quently in specification of dependently-typed languages, including 
Epigram [13] and Agda [15]. 

A telescope, A, is a sequence of variables with their types: 

X\ '. Ax , . . . , X n '■ A n • 

However, each variable scopes over the types that occur later in the 
telescope. For example, here x\ may occur in A 2 , A3, and so on. 
The name telescope comes from the optical device, which is built 
as as a sequence of segments that slide into one another. 

Telescopes are used for "aggregate binding". For example, con- 
sider the following (very simple) fragment of a dependently-typed 
language. In this language, functions can take multiple parameters 
and be applied to multiple arguments. However, because of depen- 
dent types, the type of each parameter is allowed to mention earlier 
parameters. 

A,B,M,N ::=x \ HA.B XA.M M (Ni . . . N n ) 

Telescopes gather together all of the parameters of the function 
in both its definition (XA.M) and its type (nA.73). Because tele- 
scopes are essentially typing contexts, the typing rule for abstrac- 
tions merely appends the telescope to the current typing context: 

Y,A\-M:B 
T h XA.M : n A.B 
Type checking the multi-applications requires an auxiliary judg- 
ment that determines if the vector of arguments "fits into" the tele- 
scope. 

r h JVi : A r h N 2 .. N n : A [a; H- Ni] 
rh: . r h m N 2 .. N n : (x : A, A) 

This judgment verifies that all of the arguments have the right types. 
Computing the result type of the application requires substituting 
all of the arguments for each binding variable in the the telescope. 

T\-M:IIA.A r h Ni .. N n : A 
dom (A) = x\ .. x n 

T h M(Ni .. N n ) : B\x! .. x n H> iVi .. N n ] 

This fragment demonstrates the important features of telescopes. 
Sometimes they are used as binding patterns and sometimes they 
are used as the types of vectors, independent of binding. Imple- 
menting this language using traditional unary binding would be an- 
noying because one would have to traverse the entire telescope to 
see the body of the function or the body of the dependent type. That 
is not so much of an issue in this simple example, but the semantics 
of features like inductive families (with eliminators based on induc- 
tion principles or dependent pattern matching) is greatly simplified 
by the aggregate binding that telescopes provide. 

In UNBOUND, we can represent the language fragment above 
using Rebind: 

data E = Var N 

Pi (Bind Tele E) 
I Lam (Bind Tele E) 
App E [E] 

data Tele = Empty | Rebind (N , Embed E) Tele 

Furthermore, UNBOUND automatically provides all of the machin- 
ery necessary for working with telescopes, including calculation of 



their binding variables, multiple substitution in terms, and substitu- 
tion through the telescopes. 

3.3 Recursive binding 

Our E language is looking nice, but what if we want to add some 
recursion? We can try to encode a letrec construct, 

letrec x\=e\, . . . , x n —e n in e, 

where this time, the Xi are bound in e as well as all the e^. This is 
straightforward if we are willing to lift all the Xi out to the front: 

E = ... 

Letrec (Bind [TV] ([E],E)) 

However, the problems with this sort of encoding have already been 
discussed. We would like to encode letrec in such a way that names 
and definitions are paired. 

Rebind doesn't help, because it forces us to separate binders 
from the terms over which they scope, just like Bind. We need a 
way to freely mix patterns and terms bound by the patterns in the 
same data structure. UNBOUND provides the Rec combinator for 
this purpose. In Rec P, names in the pattern P scope recursively 
over any terms embedded in P itself. However, Rec P itself is 
also a pattern, so names in P also scope externally over any term 
that binds Rec P. Intuitively, Rec is just a "recursive" version of 
Rebind. 

An appropriate encoding of letrec is therefore: 
E = ... 

Letrec (Bind (Rec [(N, Embed E)]) E) 

Here the pattern [ (N , Embed E) ] scopes over itself — hence all the 
names are bound in all the definitions — as well as over the body of 
the letrec. 

4. Semantics 

In the previous section, we gave a number of examples of spec- 
ifying different sorts of binding patterns found in programming 
languages. However, we have been fairly vague about what those 
specifications actually mean. In this section, we fill in the details by 
giving it a semantics. 

Our semantics comes in two parts. We first define the represen- 
tation of syntax with binders, and specify smart constructors and 
destructors that work with this representation. Second, we define 
the action of UNBOUND operations (a-equivalence, free variable 
calculation and substitution) in terms of this representation. Once 
we have formally defined this semantics, we prove that it satisfies 
the properties we expect (§ 5) and discuss the correspondence be- 
tween it and our concrete Haskell implementation (§ 6). 

4.1 Representation 

For simplicity, we use a locally nameless representation of terms 
with binding structure. This representation provides a straightfor- 
ward semantics for UNBOUND, one that is both simple to imple- 
ment and simple to prove things about. Using a locally nameless 
representation is an old idea — we give more details about its his- 
tory in our discussion of § 8. 

A locally nameless representation of terms with binding struc- 
ture separates bound variables, represented by de Bruijn indices, 
from free variables, represented by atoms. (Atoms are often taken 
to be strings, but any countably infinite set with decidable equality 
will do.) This representation has the advantage that a-equivalence 
is simply structural equality. Distinguishing bound variables from 
free variables in this way also means that we do not need to keep 
track of the current scope of a free variable and shift it as we move 
from one scope to another, as we would with a purely de Bruijn- 
indexed representation. 
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A ::= {x,y,z, ...} 
b ::= j®k 

t ::— x | b | K t\ . . . t n | Bind p t 

p ::= — x | K pi . . . p n | Rebind p p | Embed t | Rec p 



Figure 3. Syntax of atoms, indices, terms, and patterns 



The locally nameless syntax we use to represent terms with 
binding structure is shown in Figure 3. As in Figure 2, we separate 
terms from patterns. Terms t have term types T, whereas patterns 
have pattern types P. 

Names that appear in terms can either be free names, x, or bound 
names, b. Free names are drawn from the set A of atoms. (In the in- 
terest of simplicity, the semantics we describe here only includes a 
single sort for atoms; extending it to multiple atom sorts is straight- 
forward.) Terms also include applications of constructor constants 
K to zero or more subterms. Note that constructor application cov- 
ers all terms with some regular type R; in the semantics they are 
all handled in precisely the same way. Indeed, thanks to generic 
programming, this is actually a faithful reflection of our Haskell 
implementation, which handles all data constructors other than the 
special UNBOUND combinators uniformly. 

Like terms, patterns can be formed by the application of con- 
structors to subpatterns. Names inside patterns are binders, writ- 
ten — x , which represent binding occurrences of names. We denote 
binders with special syntax, — x , to emphasize that we should think 
of them as placeholders with an associated name. 

The astute reader may note that we are punning a bit with our 
syntax: in earlier examples, Bind and friends showed up as types, 
whereas here they are playing the role of data. The resolution of 
the apparent inconsistency is that Bind, Embed, Rebind, and Rec 
are all singleton types with eponymous constructors. 

For example, we can define an operation that lists all of the 
atoms that a pattern will bind as shown below. 

binders :: P — > [A] 

binders — x = [x] 

binders (K pi . . . p n ) = binders pi -ff ■ ■ ■ -ff binders p n 
binders (Rebind pi P2) = binders pi 4f binders p2 
binders (Embed t) =0 
binders (Rec p) = binders p 

Note that even though we are using Haskell-like syntax, this defi- 
nition is type-directed. It works for any pattern of any pattern type; 
the type of binders, P — > [A], is an abbreviation for V P : P, P — > 
[A]. In our Haskell implementation, to be discussed in more detail 
in § 6, each clause of a definition such as this one corresponds to a 
method definition in a type class instance. 

4.2 Names, indices and patterns 

Bound names b consist of two natural number indices, j@k. The 
first index j references a pattern, counting outwards from zero; the 
second index k, is an offset. It references a particular binder within 
the given pattern, counting from left to right, also starting from 
zero. For example, in 

Bind (- x ,-y,-z) (Bind - q 102) 

the bound variable 102 refers to — 2 , the index-2 binder within the 
index- 1 enclosing pattern. 

Therefore, an important part of this representation is the connec- 
tion between patterns and offsets. We make this connection precise 



with the operations nth and find (although we omit their defini- 
tions in the interest of space): 

nth :: P -> N -> Maybe A 
find :: P -> A -> Maybe N 

nth takes a pattern and a natural number n and finds the nth name 
bound in that pattern, failing if there are not enough binders, find 
takes a pattern and a name and finds the first index of that name in 
the pattern, failing if the name does not occur. 

4.3 Open and close 

The two most important operations for the locally nameless repre- 
sentation are close and open. The former is used for binding terms: 
it converts atoms (i.e. free names) to indices (i.e. bound names). 
The latter does the reverse, replacing indices that resolve to a par- 
ticular binding location by free names. 

We call the first operation close, as we are closing the term 
with respect to the free names listed in a pattern. Likewise, we 
call the inverse operation open, as we may use it to open up 
a binder in order to recurse through its subcomponents. These 
two operations are standard components for working with locally 
nameless representation [2, 10]. Here we modify them to close and 
open terms with respect to a pattern instead of a single variable, 
and also to close and open the terms embedded inside patterns. 

The close operation is defined in Figure 4. It takes as input 
a natural number level, a pattern, and a term, and returns a new 
term where free variables matching binders in the pattern have been 
replaced by bound variables at the given level. In the free variable 
case, it uses find to look for a matching binder and generate the 
appropriate index if one is found. We also define a version of 
close for patterns, closep, whose job is to recurse through patterns 
looking for Embedded terms to which close can be applied. When 
recursing under a binder (Bind, Rebind, or Rec), both close and 
closep increment the current level. 

The open operation is also defined in Figure 4. It takes a natural 
number level, a pattern, and a term, and "opens" the term by 
interpreting bound variables at the given level as references into 
the pattern, replacing them by the free variable attached to the 
referenced binder. 4 Similarly to close, open is mutually defined 
with a pattern version open p . In the case where a bound variable 
is found which matches the current level, open uses nth to index 
into the pattern and pick out the free variable associated with the 
referenced binder. 

We use close p t and open p t as convenient synonyms for 
close Opt and open Opt, respectively. 

4.4 Constructing terms and patterns 

Figure 5 lists the smart constructors and destructors that are part 
of the interface to the type combinators exported by UNBOUND. In 
the next two subsections, we discuss the implementations of these 
operations. 

We use close to define the constructors for binding abstractions. 
Closing a term with respect to the pattern binds the pattern variables 
in the term. 

bind p t = Bind p (close p t) 



4 In the locally nameless literature, open is sometimes denned as bound- 
variable substitution and generalized to replace bound variables with terms 
instead of with free variables. Such definitions save effort as often the next 
step after opening is substituting for the new free variable. In that case, 
the definition of open is a little more complex than what is presented 
here in order to deal with substituting terms with dangling bound variable 
references. We prefer our reference semantics to be as simple as possible so 
we avoid such complications. 
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close ::N^P^T^T 

close I p b = b 

close I p x = case find p x of 

Just i — ¥ l@i 
Nothing — > x 

close I p (K ti . . . tn) — K (close I p h) . . . 

(close I p tn) 

close I p (Bind p' t) = Bind (closep I p p') 

(close (I + 1) p t) 

closep ::N^P^P^P 

closep I p — x = — x 

dosep / p (K pi . . . p n ) = K (closep I p p{) . . . 

(closep I p p„) 
closep I p (Rebind p 1 p 2 ) = Rebind (closep I p pi) 

(closep (I + 1) p p 2 ) 
closep I p (Embed t) = Embed (close I p t) 
closep I p (Rec p') — Rec (closep (I + 1) p p') 

open ::N -> P T -> T 

open i p (j'OA;) | j = / = case ni/i p k of 

Just x — > x 
Nothing -> jOfc 

open Ipx = x 

open I p (K ti . . .tn) = K (open Z p ti) . . . 

(open I p tn) 
open I p (Bind p' t) = Bind (open p I p p') 

(open (I + 1) p t) 

open p :: N — > P — > P — > P 

open p I p (K pi . . . p n ) = K (open p I p pi) . . . 

(open p I p p n ) 
open P Ip = -x 

open p I p (Rebind pi p 2 ) = Rebind (open p I p pi) 

(open p (I + 1) p p 2 ) 
open p I p (Embed t) = Embed (open Z p t) 
open p I p (Rec p') = Rec (open p (I + 1) p p') 



Figure 4. c/ose and open 



Effectively, this replaces all free occurrences of variables that ap- 
pear in the pattern with indices. For example, binding a pair of vari- 
ables in a term that references both variables will produce indices 
that refer to the same pattern, but at different offsets. In contrast, 
nesting the binders produces indices that refer to the different bind- 
ing locations, but each one at the same offset (the zeroth variable in 
the pattern). 

bind (-x, - y ) (x, y) = Bind (- x , - y ) (000, 001) 

bind —x (bind — y (x, y)) = Bind — x (Bind — y 

(100,000)) 

Likewise, for pattern combinators that introduce internal bind- 
ing, we also use close to replace occurrences of the bound variable 
with indices. Note that in the case of recursive binding, we close 
the pattern with respect to itself. 

rebind pi p 2 = Rebind pi (c/osep pi p 2 ) 
rec p = Rec (closep p p) 

Finally, embed does not need to do any closing, and merely ap- 
plies the Embed constructor to the given term. Likewise, unembed 
merely returns the nested term. 



string2Name 


:: String -¥ Name T 


name2String 


:: Name 1 — > btring 


bind 


y.P^T^ Bind P T 


unbind 


:: Fresh m => Bind P T — > m (P, T) 


rebind 


:■■ Pi -> P2 -> Rebind Pi P 2 


unrebind 


:: Rebind Pi P 2 -> (Pi,P 2 ) 


rec 


:: P -> Rec P 


unrec 


:: Rec P -> P 


embed 


:: T -> Embed T 


unembed 


:: Embed T -J- T 



Figure 5. Constructors and destructors 



embed t = Embed t 

unembed (Embed t) = t 

4.5 Freshening and unbinding 

Unbinding is not quite as straightforward as binding. Given a term 
Bind p t, it is only safe to call open p t if none of the binding 
variables of p clash with existing free variables in t. Therefore, 
before opening, we must fust freshen p by assigning suitably fresh 
names to its binders. At this point, we leave the precise meaning of 
"suitably fresh" open to interpretation; some concrete alternatives 
are discussed in § 6.4. We omit the formal definition of freshening 
since it is straightforward: it simply walks over a pattern, assigning 
a suitably fresh name to each binder encountered, and stopping at 
occurrences of Embed, since these are not part of the pattern. In 
our implementation, freshening also returns a permutation which 
describes how the variables were renamed, but we omit that here. 

We can now define unbind as the operation that freshens the 
binding variables in the pattern and then opens the body of the 
binder with the new pattern. 

unbind (Bind p t) = (p' , open p' t) 
where freshen p — > p 

In contrast, the destructors for Rebind and Rec do not freshen 
before opening the rest of the pattern. Instead, they use the preex- 
isting names: 

unrebind (Rebind pi p 2 ) = (pi, openp pi p 2 ) 
unrec (Rec p) = open p p p 

One might wonder: why the difference? 

The reason is that when opening a Bind, we must generate fresh 
names for its binders. However, by the time we come to opening 
a nested binding, fresh names will have already been chosen for 
its binders when the enclosing Bind was opened. Hence there 
is no need to choose new names. In fact, we must not choose 
fresh names, since there may exist corresponding free variables 
over which we have no control. For example, consider the term 
Bind (Rebind pi p 2 ) t, in which t may contain references to 
binders in pi. If we use unbind to take it apart into Rebind p[ p 2 
and t', there will now be free variables in t' which match the names 
on binders in p[. Freshening p[ again when opening the Rebind 
would destroy this connection, and in particular would mean that 
we could not reassemble the original term using rebind and bind. 
This is why Bind and Rebind must be distinct: if we used Bind 
everywhere, unbind would have no way of knowing whether it 
was opening a top-level Bind (which must first be freshened) or a 
nested one (which must not be). 
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« ::T -> T -> Boo/ 
x « y = x = y 

61 ~ 62 = 61 = 62 

(K si . . . s n ) « (K . . . i„) = si « ti A . . . A s„ « £„ 
(Bind pi £1) w (Bind P2 fa) = Pi ~p P2 A <i £s ii 
^ P ::P -> P -> Boo/ 
-x ^p - y = True 

(K pi . . . p n ) wp (K gi . . . q„) = pi ~p gi A . . . 

A p„ ~p q n 

(Rebind pi p 2 ) ~ P (Rebind gi g 2 ) = pi ~ P gi A p 2 ^p g2 
(Embed £1) ^p (Embed £2) = £1 ~ £2 
(Recpi) ^ P (Recp 2 ) = Pi ~p P2 

fv::T-> Set A 

fvx = {x} 

/y b = 0 

/y (K ii ... tn) =fv h U---U/v t n 

fv (Bind pi) = fv p pU fv t 

fv p :: P -> Se£ A 

/Wp -x =0 

/y p (K pi . . . p n ) = /y p pi U ■ ■ ■ U/y p p„ 

/?;p (Rebind pi p 2 ) = fv p p 1 U fv p p 2 

fv p (Embed £) = fv t 

fv p (Rec p) = fvp p 

subst y.A^T^T^T 
subst xsy|x = y = s 
I otherwise — y 
subst x s b — b 

subst x s (K fi . . . tn) = K (subst x s ti) . . . (subst x s £ n ) 
safest x s (Bind pi) = Bind (substp x s p) (subst x s 

subs£ P ::A^ T -> P ^ P 

SUbstp X S — y = — y 

substp x s (K pi . . . p n ) = K (substp x s pi) . . . 

(substp x s p„) 
substp x s (Rebind pi P2) = Rebind (substp x s pi) 

(substp x s P2) 
substp x s (Embed f) = Embed (subst x s t) 
substp x s (Rec p) = Rec (substp x s p) 



Figure 6. a-equivalence, free variables, and substitution 



4.6 Free variables, a-equivalence and substitution 

Now we come to the real payoff of our representation, as we specify 
the basic UNBOUND operations of a-equivalence, free variable cal- 
culation, and capture-avoiding substitution, all shown in Figure 6. 
Their specifications are entirely straightforward — and, as described 
in the next section, proving things about them is not much harder! 

The a-equivalence relation on terms is defined mutually with a 
notion of equivalence for patterns which ignores binders and checks 
that embedded terms are a-equivalent. This a-equivalence relation 
is essentially structural equality — the only reason it is not precisely 
structural equality is that name annotations on binders are ignored 
in PEQ_BlNDER. 

Computing free variables is equally straightforward. Since free 
and bound variables are distinguished syntactically, we need only 



recurse through terms and patterns collecting all the free variables 
we find. 

Finally, we define substitution into terms and patterns. If we see 
the free variable we are substituting for, we replace it with the term 
being substituted; otherwise we simply recurse. We need do noth- 
ing special when recursing through binders: since free and bound 
variables are distinguished syntactically, we are in no danger of 
mistaking a bound variable for a free variable we should substitute 
for, or of accidentally capturing any free variables in the substituted 
term. 

5. Metatheory 

We've now defined a simple semantics for our pattern specification 
language in terms of a locally nameless representation. But how 
can we know whether this semantics is at all meaningful? Well, we 
prove stuff, of course! 

In the previous section, we noted how straightforward the defi- 
nitions of various operations were. Likewise, most of the proofs re- 
garding these operations are also straightforward (but full of fiddly 
details). We therefore omit most of the proofs and give only brief 
sketches of a few. The fact that the proofs are straightforward is a 
testament to the elegance of the locally nameless representation. 

There is already a lot of work to draw on for the metatheory 
of locally nameless representations in the single binder case [1,2]; 
much of the metatheory here can be seen as an extension of this 
work. 

5.1 Local closure 

One important property of the locally nameless representation is 
that only some terms are good representations. In particular, there 
is some "junk" in our representation, and we would like to know 
that we (and users of our library) never need to deal with it. The 
local closure relation in Figure 7 is an invariant for our representa- 
tion. This relation excludes terms with "dangling" bound variables. 
For example the term 0@0, a bound variable with no surrounding 
binder, is not locally closed. 5 

By making the type combinators of our library abstract, we can 
demonstrate a central property of our interface: users of the library 
can only construct locally closed patterns and terms. 

Theorem 1 (Constructors and destructors preserve local closure). 
All exported term and pattern constructors and destructors pre- 
serve local closure. 

Proof. Requires considering the action of each constructor and 
destructor individually, appealing to a number of properties about 
the interaction of local closure, open and close. □ 

Theorem 2 (Substitution preserves local closure). 

• IfLCtand LCt' then LC subst x t t'. 
' If LCt and LC p then LC substp x t p. 

Proof. Straightforward induction using a generalized version of 
local closure. □ 

Lemma 3 (Freshening preserves local closure). If LC p and 
freshen p — > p', then LC p' . 

Proof. Easy induction; freshening only changes names on binders, 
which the LC relation ignores. □ 

Next, we show that a-equivalence is an equivalence relation that 
is respected by the operations of our library. 



5 This relation is an extension of McKinna and Pollack's VClosed rela- 
tion [14]. 
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LC t | t is locally closed 
LC_FREE 



LCx 

LC ti .. LC t n 
LC K t\ .. t n 

LC p LC (openp i) 
LC Bind p t 



LC_CON 



LC_BlND 



LC p p is locally closed 



LCP.BINDER 



LC 

LC pi .. LC p n 
LC Kpi ..p n 

LC pi LC (open p pi P2) 
LC Rebind pi P2 

LC t 



LCP_Con 
LCP_Rebind 



LC Embed t 

LC {open p p p) 
LC Rec p 



LCP_Embed 
LCP_REC 



Figure 7. Local closure of terms and patterns 



Theorem 4. « is an equivalence. 

Proof. Reflexivity, symmetry, and transitivity can each be estab- 
lished by straightforward induction. □ 

Theorem 5 {fv respects a-equivalence). Ifti « ti, then fv ti = 
fi> to- 
Proof. Straightforward induction on fv ti along with a similar 
proof for fv p . □ 

Theorem 6 (Substitution respects a-equivalence). If t\ « £2 awrf 
si ~ S2, ?Aen [a; i-> si]ti ~ [a; 1— > S2]^2- 

Proo/ Straightforward induction on the derivation of t\ « <2, 
along with a similar proof for pattern substitution. □ 

We next specify how the operations of a-equivalence, free vari- 
able calculation, and substitution interact with Bind. The proofs of 
these remaining theorems rely on properties about the interactions 
between close and each of the operations. 

The first theorem states the interaction between binding and in- 
equivalence. It states that two bindings are a-equivalent when we 
can freshen two patterns to the same new result, and then show that 
their bodies are a-equivalent under a consistent renaming. Below, 
7Ti and 7T2 are the permutations returned by freshen and 7Ti ■ t\ is 
the application of a permutation to a term. 

Theorem 7. If freshen pi — > p,ni and freshen p2 — > p,H2 and 
7Ti ■ t\ « 7T2 ■ t2 then bind pi t\ « bind P2 h.. 

The second theorem specifies the behavior of fv for binders. 

Theorem 8. fv {bind p t) = fv P p U {fv t — binders p). 

Finally, we specify the conditions when substitutions are per- 
mitted to commute through bindings. 



type N = Name E 
data E = Var N 

App E E 

Lam (Bind N E) 
deriving Show 

$ {derive [" E]) 

instance Alpha E 

instance Subst E E where 

isvar {Var n) = Just {SubstName n) 
isvar _ = Nothing 



Figure 8. Representing the untyped lambda calculus 



Theorem 9. If{x} U fv t is disjoint from binders p, then 
subst x t {bind p t') — bind {substp x t p) {subst x t t'). 

6. Implementation 

The UNBOUND specification language is implemented as an em- 
bedded domain specific language (EDSL) in GHC Haskell, includ- 
ing all of the functionality described above and more. Terms and 
patterns are normal Haskell datatypes, and combinators such as 
Name, Bind and Rebind are abstract types provided by our li- 
brary. The implementation of UNBOUND closely follows the se- 
mantics that we presented in the previous section, with UNBOUND 
operations such as fv, subst and • w • provided for user-defined 
datatypes via generic programming. 

Below, we give an overview of our library, first by giving a 
short example of how it may be used in a Haskell program, and 
then discussing the implementation details. Figure 9 summarizes 
the important UNBOUND operations that we discuss in this section. 

Figure 8 shows a definition of the untyped lambda calculus 
using UNBOUND. The first part of the figure is the same as in § 2. 
The rest of the figure includes the small amount of "boilerplate" 
necessary to use UNBOUND with the type E. 

We implement UNBOUND using the RepLib generic program- 
ming library [31]. RepLib works by producing generic represen- 
tation instances for each type. Roughly, a representation instance 
records the structure of a datatype by analyzing its data dec- 
laration. In this case, the call %{derive [' ' E]) uses Template 
Haskell [27] to generate the generic representation for E. This 
structure information is used to automatically generate particular 
functions over E on demand. 

The following line in the figure declares E to be an instance 
of the Alpha type class, which governs a-equality, free variable 
and freshening operations. Happily, default implementations for the 
methods of Alpha are defined generically. Guided by the occur- 
rences of Bind and Name in the definition of E, the default defi- 
nitions of these methods behave exactly like their specifications in 
the semantics section. 

Capture-avoiding substitution is governed by the Subst class, 
and requires a tiny bit of work on our part: we must indicate where 
variables are located in datatypes. Beyond that, the generic default 
implementation suffices. In general, the type of subst, declares that 
values of type b may be substituted for free variables occurring in 
values of type a, so the Subst E E instance shown declares that E 
values may be substituted for Vars in other E values. 

By making Subst a multiparameter type class we have flexibil- 
ity and safety. Imagine a different declaration of the type E which 
contains both variables abstracting E, and type variables abstract- 
ing Typ. An instance Subst E E declares that E variables can be 
replaced with Es, and another instance Subst Typ E would de- 
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• « ■ :: Alpha a => a — > a — S- Bool — alpha equivalence 

acompare :: Alpha a => a — > a — > Ordering — alpha-respecting comparison 

fv :: (Alpha a, Rep b, Collection c) a — >■ c (Name 6) — free names (single sort) 

fvAny :: (Alpha a, Collection c) => a — > c AnyName — free names (all) 

fv p :: (Alpha a, Rep b, Collection c) => a — > c (Name 6) — free names in annotations (single) 

fVpAny :: (Alpha a, Collection c) => a — > c AnyName — free names in annotations (all) 

binders :: (Alpha a, Rep b, Collection c) => a — > c (Name 6) — binding names (single sort) 

bindersAny :: (Alpha a, Collection c) => a — > c AnyName — binding names (all) 

freshen :: (Alpha a, Fresh m) => a —¥ m (a, Perm AnyName) — rename with fresh variables (returns a permutation) 

swaps :: Alpha a => Perm AnyName — > a — > a — permute variables 

subst :: Subst b a => Name b — > 6 — > a — > a — single substitution 

substs :: Subst b a => [(Name b, b)] — > a — > a — simultaneous substitution 



Figure 9. Overview of selected UNBOUND operations 



clare that Typ variables can be replaced with Typs inside Es. In 
the latter case, we would use the default definition of isvar as there 
is no way to replace Typ variables with Typs an get an E. In fact, 
the type indices of Name and SubstName would not allow us to 
give a definition of isvar that would confuse Typ and E variables. 

The operations fv, bind, unbind and subst are implemented in 
terms of the Alpha and Subst type classes. Therefore, Figure 8 
provides all the necessary definitions for the parallel reduction 
example in Figure 1 (which is valid Haskell code). 

6.1 Multi-sorted names and AnyName 

Instead of a single homogeneous set of atomic names, UNBOUND 
has multisorted names. Consider the type declarations in Figure 9. 
Names are indexed by a type, and the type of subst ensures that 
only things of type t may be substituted for i-indexed names. The 
fv, fv P , and binders functions are also polymorphic in their re- 
sult type, ignoring names whose type index does not match the re- 
quested result type. In this way, one may calculate, say, just the 
free term variables or just the free type variables from an expres- 
sion. These functions are overloaded, so type inference determines 
precisely what sort of names will be calculated, and what sort of 
data structure (list, set, etc.) will be used to collect them. 

However, sometimes we would like to know all free names, no 
matter what their sort. Therefore, UNBOUND also provides the type 
AnyName, which existentially hides the type index on a name, and 
the functions fvAny and bindersAny which return all appropriate 
names wrapped in AnyName constructors. 

6.2 The Alpha type class 

One way in which our implementation differs from our seman- 
tics is that while the semantics statically differentiates between 
terms and patterns, the implementation does not. RepLib lim- 
itations that instead of having two type classes Term a and 
Pattern a, we must have a single type class Alpha a which 
serves both purposes. This conflation means that our implementa- 
tion cannot statically prevent meaningless types which use a pat- 
tern as a term (i.e. Embed (Embed N)) or a term as a pattern (i.e. 
Bind (Bind N E) E) from being used in a binding specification. 6 
However, this conflation does have an advantage. The opera- 
tions of the Alpha class are actually parameterized by a mode 
which determines whether the type is being used as a term or a 
pattern. For example, the parameterized free variable function fv' 
in the Name instance collects the Name in term mode (because it 



6 We do, however, provide dynamic checks isPat and is Term that can be 
used to ensure the invariants are maintained. 



is a free name) but ignores it in pattern mode (because it is part 
of a pattern binding). Many types (such as products and sums) are 
parametric in the mode and use the same behavior in both cases. 

6.3 Specific instances for Alpha 

Default implementations for Alpha methods are defined via generic 
programming, but they may be overridden for greater control or 
customization. This capability is necessary in practical uses of 
UNBOUND for specific types. 7 

For example, suppose we would like to tag variables with source 
position information in our abstract syntax: 

data£ = ... 

| Var SourcePos N 

To make E an instance of Alpha, we need SourcePos to also be 
an instance of Alpha, because it appears inside the E type. If we 
would like a-equivalence to ignore source positions, we can simply 
override the default definition of a-equivalence for SourcePos. 
By identifying all source positions as equivalent, we ensure that 
expressions appearing in different positions can still be determined 
to be a-equivalent. 8 

instance Alpha SourcePos where 
acq' = True 

6.4 Freshness monads 

Since the freshen operation relies on the generation of fresh names, 
operations which make use of it (such as unbind) must execute 
within a monad. The Fresh type class, shown in Figure 10, governs 
those monads which can be used for this purpose, and is used to 
avoid tying users down to one particular concrete monad. 

The Fresh class is quite simple: it requires only a single op- 
eration fresh, which takes a name as input and generates a new 
name (based on the given name) which is guaranteed to be "glob- 
ally fresh" in some appropriate sense. For example, a simple con- 
crete implementation might keep track of a global counter which is 
incremented every time fresh is called, appending the new counter 
value to the given base variable name. 

However, this is unsatisfactory in many instances. For example, 
an implementation of a pretty-printer for the lambda calculus based 
on fresh might format the term A - x A - y A - z (2@0 100) 0@0 as 
A xl -> A y2 -> A z3 -> (xl y2) z3. We can see perfectly 

7 This capability for overriding generic functions is inspired by the SYB3 
library [12]. 

8 The first argument to aeq' is the mode information mentioned earlier. 
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class Monad m => Fresh m where 
fresh :: Name a — > m (Name a) 

class Monad rn => LFresh m where 

Ifresh :: Rep a =>■ Name a — > m (Name a) 
avoid :: [AnyName] — > m a — > m a 



Figure 10. The Fresh and LFresh type classes 

well that the numeric suffixes are unnecessary, since the existing 
names do not clash, but the fresh operation does not have enough 
information to make this assessment. 

For this reason, we also provide the more sophisticated LFresh 
type class for monads which can generate locally fresh names. 
Ifresh works much like fresh, but it has more to go on: avoid ns m 
allows us to specify that the names ns should be avoided by Ifresh 
in the subcomputation m. Unlike fresh, Ifresh is not guaranteed 
to pick globally fresh names; it only guarantees not to pick names 
proscribed by an enclosing call to avoid. The intention is that it will 
return its argument unchanged when that name is not specifically 
to be avoided. 

Using LFresh, a pretty-printer for the lambda calculus can be 
written which uses avoid every time it recurses under a binder, so 
that names are chosen fresh with respect to exactly those names 
currently in scope. 

UNBOUND provides several concrete implementations for Fresh 
and LFresh, including transformer versions for adding fresh name 
generation capabilities to existing monads, and instances allow- 
ing their use with all the standard monad transformers in the 
transformers package. 9 

6.5 Simultaneous unbinding 

Up to now, we have seen only examples of opening a single abstrac- 
tion with arbitrary fresh names. However, some situations require 
simultaneously opening two or more abstractions with the same 
fresh names. For example, in order to check the convertibility of 
two LF fl-types, we must open them with the same fresh name and 
recursively check the convertibility of the bodies. 

In order to simultaneously open two abstractions Bind pi t\ 
and Bind p-2 £2, we require only that pi and P2 have the same 
number of binders. Requiring a stronger match between pi and p2 
would be unnecessarily limiting. For example, continuing our LF 
checking example, pi and P2 might contain type annotations which 
are convertible but not a-equivalent. 

Therefore, UNBOUND provides a function unbind2 that simul- 
taneously opens two related bindings. 

unbind2 :: (Fresh m, Alpha pi, Alpha P2, 

Alpha ti, Alpha £2) => 
Bind pi t\ — > Bind P2 t2 — > 
m (Maybe (p 1: h,p 2 , t 2 )) 
unbind2 (B p\ ti) (B P2 t2) = do 

case mkPerm (binders Any pi) (bmdersAny P2) of 
Just 7r — > do 

(p[ , 7r') -s— freshen pi 
return (Just (p[, open p[ ti, 

swaps (tt' o 7r) P2, open p[ £2)) 
Nothing — > return Nothing 

This function works by first matching the binding variables of the 
two patterns together to create a permutation tt. This operation 
will fail if the patterns bind different numbers of variables. Next, 

9 http : //hackage .haskell . org/package/transf ormers 



it freshens the first pattern pi and uses the result to open t\ and £2- 
Finally, it must compose the permutation from freshening p\ with 
that from the match, and use the new permutation to rename the 
second pattern. 

7. Discussion 

7.1 Nominal semantics 

We have presented a semantics for the UNBOUND specification lan- 
guage in terms of a locally nameless representation, but this is not 
our only possible choice. UNBOUND could also be specified via an 
equivalent nominal semantics [17], and we are working in parallel 
on a nominal-style Haskell implementation. Such an alternative se- 
mantics would provide differences in running time/space, but oth- 
erwise would behave identically to the locally nameless version. In 
future work, we plan to formalize the precise connection between 
the two formulations and prove their equivalence with respect to 
the abstract interface provided by the library. 

Although a nominal semantics might appear more natural to 
think about, in our experience the locally nameless semantics is far 
easier to understand when it comes to generalized binding patterns, 
especially with nesting. Therefore, an important contribution of this 
work is the identification of a simple semantics for pattern binding. 

7.2 Caml-style specifications 

Francois Pottier's Caml system [19] features a single- argument ab- 
straction constructor, inside which patterns and terms (both bound 
and unbound) can be mixed. Directly inside an abstraction is a pat- 
tern, with terms embedded via outer (indicating a term outside the 
scope of the pattern) or inner (indicating that the pattern binds 
names in the term). Caml's abstraction constructor (p) is easily 
definable with UNBOUND as Bind (Rec p) (). Within that pattern, 
occurrences of Embed are analogous to occurrences of inner in 
Caml. To account for Caml's outer scope specification, we gener- 
alize the Embed combinator by adding a natural number subscript. 
When encountering Embed n while doing an open or close oper- 
ation, we decrement the level by n. Hence, Embedo corresponds 
to the original Embed, and Embedi corresponds to Caml's outer 
construct, since it shifts the scope of an embedded term out to the 
next enclosing level. 

In UNBOUND we implement indexed embedding by adding a 
new type combinator Shift P. This type increments the index of 
its argument, so Shift (Embed T) corresponds to Embedi T, and 
Shift (Shift (Embed T)) is Embed 2 T. Operationally, all Shift 
does is decrement the binding level when the pattern is opened or 
closed so that variables will resolve to an outer scope. 

7.3 Unbound in practice 

We have been using UNBOUND in the TRELLYS project, in 
the context of type checking and evaluation of an experimental 
dependently-typed language. In this context, UNBOUND support 
for telescopes is essential. Our experience with TRELLYS has al- 
lowed us to find and correct a few bugs in our implementation of 
UNBOUND, but for the most part the use of UNBOUND has been 
unremarkable, in the sense that it seems to "just work". 

TRELLYS is an ideal client for UNBOUND, in that it is a 
prototype language with a greater emphasis on semantics than 
performance. Because UNBOUND is implemented using generic 
programming, it will be slower than a hand-coded implementa- 
tion [23]. If necessary, we could easily replace the generic defi- 
nitions with hand-coded operations by overriding the Alpha type 
class instances. 

The locally nameless representation does have some perfor- 
mance concerns with respect to its use of open and close. While 
the standard operations fv, subst, and • ^ • are linear in the size 
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of terms, operations that are defined in the freshness monad must 
open and close the terms for each binding level, which could be 
expensive. Although we have not had any difficulties of this sort 
in TRELLYS (our experiments are still small) it is possible that 
with larger programs and deeper binding depths, these operations 
could dominate. To mitigate this difficulty, UNBOUND supports an 
"experts-only" interface, where critical operations can be written 
directly over the terms (in a manner similar to our implementations 
of fv, subst and • « •)• 

We have not explored the interaction of UNBOUND with stan- 
dard optimizations [26]. For example, by caching free names, an 
implementation of substitution could stop early if the name be- 
ing substituted for is not cached. If we remove names from bound 
patterns (which are preserved only for error messages) the locally 
nameless representation interacts nicely with hash-consing, as all 
a-equivalent terms have the same representation. 

We are also working on bringing our nominal implementation 
of UNBOUND up to date with our reference locally nameless im- 
plementation. Both implementations provide the same interface to 
clients. When using the LFresh monad, the nominal implementa- 
tion could avoid freshening when unbinding patterns if the patterns 
were already "sufficiently" fresh. However, there are trade-offs in- 
volved; for example, a-equivalence can be more expensive with the 
nominal version. 

One important contribution of UNBOUND is that it provides an 
abstract interface to name binding. Clients can write their code 
against this interface, and, depending on their particular applica- 
tion, choose the most appropriate implementation. Importantly, our 
locally nameless implementation provides a simple reference se- 
mantics for this interface, and alternative implementations may use 
this semantics to prove their correctness. 

8. Related Work 

Locally nameless representation The locally nameless represen- 
tation dates back to the introduction of de Bruijn indices, and is 
mentioned in the conclusion of de Bruijn's paper [7]. The key idea 
is even older. It rests on the separation of names into two distinct 
classes: variables (for locally bound variables) and parameters (for 
free, or globally bound variables) and goes back to Kleene [11], 
Prawitz [22] and Gentzen [9]. The full history of the locally name- 
less representation is outside the scope of this paper, but we refer to 
Aydemir et al [2] and Chargueraud [4] which discusses it in detail. 
Instead, we focus on the interaction between this representation, 
generic programming and binding specifications. 

Chargueraud 's paper [4] also gives several examples of locally- 
nameless representations of languages with specific binding forms, 
including binding a list of variables, pattern matching (with embed- 
ded terms), and recursive bindings. However, he does not consider 
a compositional framework for describing generalized binding. 

Zappa Nardelli's locally nameless backend [32] for the OTT 
tool [25] automatically generates definitions for the Coq proof as- 
sistent given a specification of a language (with single binding 
only). These definitions include a locally nameless representation 
of the syntax, open and close operations, a-equivalence, substitu- 
tion, and free variable calculation. The LNgen tool of Aydemir and 
Weirich [1] augments this output with generic proofs about this 
representation, including many of the properties of § 5. 

Tools for general bindings The OTT tool provides an expressive 
specification language for generalized binding in programming lan- 
guages. In conjunction with a specification of the abstract syntax, 
OTT allows the definition of bindspecs: functions that arbitrarily 
select the binding variables that appear in terms and bind them 
elsewhere in the abstract syntax. They give a semantics of this 
specification language using a representation with concrete vari- 



able names [24] and show that under appropriate conditions, their 
concrete substitution functions respect a-equivalence and coincide 
with capture-avoiding substitution. 

Inspired by the OTT specification language, Urban and Kaliszyk 
recently extended the Nominal Isabelle proof assistant with sup- 
port for general bindings [30]. Their system works by using the 
OTT binding specifications (with some restrictions) to define a- 
equivalence classes of syntax with binders which they use to model 
nominal-logic specifications. While a direct comparison is diffi- 
cult, their restrictions prevent variables from being bound by two 
different bindspecs, which seems necessary for telescopes. On the 
other hand, they also add two forms of set bindings to their speci- 
fications, allowing binders to be equivalent up to permutation and 
weakening of their patterns. 

As discussed in § 7.2, there is a close connection between UN- 
BOUND and Francois Pottier's Caml system [19], based on a nom- 
inal semantics for binding. One major difference is that Caml ex- 
plicitly does not include support for nested binders. Another differ- 
ence is that Caml is an external tool that performs a preprocessing 
step, whereas UNBOUND is a library. However, this is not a fun- 
damental difference; Caml could be made into a library as well if 
OCaml had better support for generic programming (likewise, UN- 
BOUND could be ported to languages without support for generic 
programming by making it into an external tool). 

Cheney's FreshLib [6] is a Haskell library which served as an 
inspiration for UNBOUND. Like UNBOUND, it uses generic pro- 
gramming to automatically define a-equivalence and substitution 
functions (although FreshLib is based on a nominal semantics for 
name binding, so the generic operations that establish its semantics 
are different). FreshLib also supports some forms of generalized 
binding, but does not give a generic treatment of patterns. 

Other tools based on nominal logic include FreshML [29] and 
FreshOCaml [28]. However, they support limited forms of binding 
patterns which do not include embeddings, recursive or nested 
bindings. Likewise, the Haskell Nominal Toolkit [3] is a library 
that supports single binding for a fixed term structure. 

9. Future work 

Although we believe UNBOUND is useful in its current state, there 
are several directions in which we would like to extend it. 

Other forms of "exotic" binding Cheney [5] gives a catalogue of 
"exotic" binding, renaming, and structural congruence situations. 
Although UNBOUND can express many of these examples, we hope 
to extend UNBOUND with better support for "global" binding, such 
as that used for objects and modules. Furthermore, we would also 
like to add support for set binding similar to Nominal Isabelle [30]. 
However, implementing unbtnd.2 in the presence of such bindings 
is a nontrivial task. 

User-defined names The current implementation of our library 
imposes String as the type of atomic names. However, there is 
nothing particularly special about Stringy, in theory, any type with 
decidable equality could be used. Inspired by Cheney's Fresh- 
Lib [6], we intend to explore extending our library to support the 
use of arbitrary user-supplied types as atomic names. We have so 
far avoided this generalization to simplify definition of the Alpha 
type class. On the other hand, the additional flexibility may also 
help us to share code among our different implementations. 

Better static distinction between names and patterns As dis- 
cussed in § 6.2, an infelicity of our current design is the ability 
to get terms and patterns mixed up. UNBOUND inherits this limita- 
tion from RepLib; it is possible that an alternative framework for 
generic programming would perform better in this respect. 
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Scoping UNBOUND does not keep track of the scope of names 
once they have been unbound. While this leads to a familiar and 
flexible interface, it does not rule out bugs that could occur from 
names escaping their scope. Pottier et al. have made some progress 
in this respect [20, 21], and we would like to explore a variant 
interface for UNBOUND that provides this tighter control. 

Mechanized metatheory The UNBOUND specification language 
seems ideal for incorporation into tools like Ott, LNgen and Nom- 
inal Isabelle that assist in the formalization of programming lan- 
guage metatheory. Indeed, locally nameless representations have 
already proved useful for that sort of reasoning. 

10. Conclusion 

UNBOUND is an expressive specification language for generalized 
binding structures, defined with a simple compositional semantics, 
proven correct, and immediately available to the GHC user com- 
munity. Because it supports the rapid development of typecheckers, 
compilers, and interpreters, it is a valuable tool for exploration in 
programming language design. Furthermore, we hope that the de- 
sign of UNBOUND itself will be a model for future work on library 
support for expressive binding structure. 
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